From a Distance: Using Cross-lingual Word Alignments for Noun Compound Bracketing
نویسندگان
چکیده
We present a cross-lingual method for determining NP structures. More specifically, we try to determine whether the semantics of tripartite noun compounds in context requires a left or right branching interpretation. The system exploits the difference in word position between languages as found in parallel corpora. We achieve a bracketing accuracy of 94.6%, significantly outperforming all systems in comparison and comparable to human performance. Our system generates large amounts of high-quality bracketed NPs in a multilingual context that can be used to train supervised learners.
منابع مشابه
Word Alignment Based on Bilingual Bracketing
In this paper, an improved word alignment based on bilingual bracketing is described. The explored approaches include using Model-1 conditional probability, a boosting strategy for lexicon probabilities based on importance sampling, applying Parts of Speech to discriminate English words and incorporating information of English base noun phrase. The results of the shared task on French-English, ...
متن کاملMultiword noun compound bracketing using Wikipedia
This research suggests two contributions in relation to the multiword noun compound bracketing problem: first, demonstrate the usefulness of Wikipedia for the task, and second, present a novel bracketing method relying on a word association model. The intent of the association model is to represent combined evidence about the possibly lexical, relational or coordinate nature of links between al...
متن کاملA Dataset for Joint Noun-Noun Compound Bracketing and Interpretation
We present a new, sizeable dataset of noun– noun compounds with their syntactic analysis (bracketing) and semantic relations. Derived from several established linguistic resources, such as the Penn Treebank, our dataset enables experimenting with new approaches towards a holistic analysis of noun–noun compounds, such as jointlearning of noun–noun compounds bracketing and interpretation, as well...
متن کاملBilBOWA: Fast Bilingual Distributed Representations without Word Alignments
We introduce BilBOWA (Bilingual Bag-ofWords without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data. Instead it trains directly on monolingual data and extracts a bilingual signal from a smaller set of raw-text sentence-alig...
متن کاملLearning Cross-lingual Word Embeddings via Matrix Co-factorization
A joint-space model for cross-lingual distributed representations generalizes language-invariant semantic features. In this paper, we present a matrix cofactorization framework for learning cross-lingual word embeddings. We explicitly define monolingual training objectives in the form of matrix decomposition, and induce cross-lingual constraints for simultaneously factorizing monolingual matric...
متن کامل